Goto

Collaborating Authors

 Training Manual






MMPlanner: Zero-Shot Multimodal Procedural Planning with Chain-of-Thought Object State Reasoning

Tabassum, Afrina, Guo, Bin, Ma, Xiyao, Eldardiry, Hoda, Lourentzou, Ismini

arXiv.org Artificial Intelligence

Multimodal Procedural Planning (MPP) aims to generate step-by-step instructions that combine text and images, with the central challenge of preserving object-state consistency across modalities while producing informative plans. Existing approaches often leverage large language models (LLMs) to refine textual steps; however, visual object-state alignment and systematic evaluation are largely underexplored. We present MMPlanner, a zero-shot MPP framework that introduces Object State Reasoning Chain-of-Thought (OSR-CoT) prompting to explicitly model object-state transitions and generate accurate multimodal plans. To assess plan quality, we design LLM-as-a-judge protocols for planning accuracy and cross-modal alignment, and further propose a visual step-reordering task to measure temporal coherence. Experiments on RECIPEPLAN and WIKIPLAN show that MMPlanner achieves state-of-the-art performance, improving textual planning by +6.8%, cross-modal alignment by +11.9%, and visual step ordering by +26.7%


Advancing Responsible Innovation in Agentic AI: A study of Ethical Frameworks for Household Automation

Chandra, Joydeep, Navneet, Satyam Kumar

arXiv.org Artificial Intelligence

The implementation of Artificial Intelligence (AI) in household environments, especially in the form of proactive autonomous agents, brings about possibilities of comfort and attention as well as it comes with intra or extramural ethical challenges. This article analyzes agentic AI and its applications, focusing on its move from reactive to proactive autonomy, privacy, fairness and user control. We review responsible innovation frameworks, human-centered design principles, and governance practices to distill practical guidance for ethical smart home systems. Vulnerable user groups such as elderly individuals, children, and neurodivergent who face higher risks of surveillance, bias, and privacy risks were studied in detail in context of Agentic AI. Design imperatives are highlighted such as tailored explainability, granular consent mechanisms, and robust override controls, supported by participatory and inclusive methodologies. It was also explored how data-driven insights, including social media analysis via Natural Language Processing(NLP), can inform specific user needs and ethical concerns. This survey aims to provide both a conceptual foundation and suggestions for developing transparent, inclusive, and trustworthy agentic AI in household automation.


A Step-by-Step Guide to Creating a Robust Autonomous Drone Testing Pipeline

Jiang, Yupeng, Deng, Yao, Schroder, Sebastian, Liang, Linfeng, Gambhir, Suhaas, James, Alice, Seth, Avishkar, Pirrie, James, Zhang, Yihao, Zheng, Xi

arXiv.org Artificial Intelligence

Autonomous drones are rapidly reshaping industries ranging from aerial delivery and infrastructure inspection to environmental monitoring and disaster response. Ensuring the safety, reliability, and efficiency of these systems is paramount as they transition from research prototypes to mission-critical platforms. This paper presents a step-by-step guide to establishing a robust autonomous drone testing pipeline, covering each critical stage: Software-in-the-Loop (SIL) Simulation Testing, Hardware-in-the-Loop (HIL) Testing, Controlled Real-World Testing, and In-Field Testing. Using practical examples, including the marker-based autonomous landing system, we demonstrate how to systematically verify drone system behaviors, identify integration issues, and optimize performance. Furthermore, we highlight emerging trends shaping the future of drone testing, including the integration of Neurosymbolic and LLMs, creating co-simulation environments, and Digital Twin-enabled simulation-based testing techniques. By following this pipeline, developers and researchers can achieve comprehensive validation, minimize deployment risks, and prepare autonomous drones for safe and reliable real-world operations.


Machine learning the first stage in 2SLS: Practical guidance from bias decomposition and simulation

Lennon, Connor, Rubin, Edward, Waddell, Glen

arXiv.org Machine Learning

Machine learning (ML) primarily evolved to solve "prediction problems." The first stage of two-stage least squares (2SLS) is a prediction problem, suggesting potential gains from ML first-stage assistance. However, little guidance exists on when ML helps 2SLS$\unicode{x2014}$or when it hurts. We investigate the implications of inserting ML into 2SLS, decomposing the bias into three informative components. Mechanically, ML-in-2SLS procedures face issues common to prediction and causal-inference settings$\unicode{x2014}$and their interaction. Through simulation, we show linear ML methods (e.g., post-Lasso) work well, while nonlinear methods (e.g., random forests, neural nets) generate substantial bias in second-stage estimates$\unicode{x2014}$potentially exceeding the bias of endogenous OLS.


Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch

Wen, Xueru, Lou, Jie, Li, Zichao, Lu, Yaojie, Yu, Xing, Ji, Yuqiu, Xu, Guohai, Lin, Hongyu, He, Ben, Han, Xianpei, Sun, Le, Zhang, Debing

arXiv.org Artificial Intelligence

Reward models (RMs) are crucial for aligning large language models (LLMs) with human preferences. However, most RM research is centered on English and relies heavily on synthetic resources, which leads to limited and less reliable datasets and benchmarks for Chinese. To address this gap, we introduce CheemsBench, a fully human-annotated RM evaluation benchmark within Chinese contexts, and CheemsPreference, a large-scale and diverse preference dataset annotated through human-machine collaboration to support Chinese RM training. We systematically evaluate open-source discriminative and generative RMs on CheemsBench and observe significant limitations in their ability to capture human preferences in Chinese scenarios. Additionally, based on CheemsPreference, we construct an RM that achieves state-of-the-art performance on CheemsBench, demonstrating the necessity of human supervision in RM training. Our findings reveal that scaled AI-generated data struggles to fully capture human preferences, emphasizing the importance of high-quality human supervision in RM development.


Deep Learning and Machine Learning, Advancing Big Data Analytics and Management: Tensorflow Pretrained Models

Chen, Keyu, Bi, Ziqian, Niu, Qian, Liu, Junyu, Peng, Benji, Zhang, Sen, Liu, Ming, Li, Ming, Pan, Xuanhe, Xu, Jiawei, Wang, Jinlang, Feng, Pohsun

arXiv.org Artificial Intelligence

The application of TensorFlow pre-trained models in deep learning is explored, with an emphasis on practical guidance for tasks such as image classification and object detection. The study covers modern architectures, including ResNet, MobileNet, and EfficientNet, and demonstrates the effectiveness of transfer learning through real-world examples and experiments. A comparison of linear probing and model fine-tuning is presented, supplemented by visualizations using techniques like PCA, t-SNE, and UMAP, allowing for an intuitive understanding of the impact of these approaches. The work provides complete example code and step-by-step instructions, offering valuable insights for both beginners and advanced users. By integrating theoretical concepts with hands-on practice, the paper equips readers with the tools necessary to address deep learning challenges efficiently.